Quiz: One-Step Dynamics

Consider the recycling robot example. In the previous concept, we described one method that the environment could use to decide the state and reward, at any time step.

Say at an arbitrary time step t, the state of the robot's battery is high (S_t = \text{high}). Then, in response, the agent decides to search (A_t = \text{search}). You learned in the previous concept that in this case, the environment responds to the agent by flipping a theoretical coin with 70% probability of landing heads.

If the coin lands heads, the environment decides that the next state is high (S_{t+1} = \text{high}), and the reward is 4 (R_{t+1} = 4).
If the coin lands tails, the environment decides that the next state is low (S_{t+1} = \text{low}), and the reward is 4 (R_{t+1} = 4).

This is depicted in the figure below.

In fact, for any state S_{t} and action A_{t}, it is possible to use the figure to determine exactly how the agent will decide the next state S_{t+1} and reward R_{t+1}.

SOLUTION:

The next state is high, and the reward is 1.

SOLUTION:

The next state is high, and the reward is 0.